NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

A Workflow for the Synthesis of Irregular Memory Access Microbenchmarks

https://doi.org/10.1145/3695794.3695816

Sheridan, Kevin; Dominguez-Trujillo, Jered; Shipman, Galen; Lavin, Patrick; Scott, Christopher; Vaca_Valverde, Agustin; Vuduc, Richard; Young, Jeffrey (September 2024, ACM)

Full Text Available
Multifidelity Memory System Simulation in SST

https://doi.org/10.1145/3631882.3631890

Lavin, Patrick; Young, Jeffrey; Vuduc, Richard (October 2023, ACM)
Online model swapping for architectural simulation

https://doi.org/10.1145/3457388.3458670

Lavin, Patrick; Young, Jeffrey; Vuduc, Richard; Beard, Jonathan (May 2021, The 18th International Conference on Computing Frontiers)
null (Ed.)
As systems and applications grow more complex, detailed computer architecture simulation takes an ever increasing amount of time. Longer simulation times result in slower design iterations which then force architects to use simpler models, such as spreadsheets, when they want to iterate quickly on a design. Simple models are not easy to work with though, as architects must rely on intuition to choose representative models, and the path from the simple models to a detailed hardware simulation is not always clear. In this work, we present a method of bridging the gap between simple and detailed simulation by monitoring simulation behavior online and automatically swapping out detailed models with simpler statistical approximations. We demonstrate the potential of our methodology by implementing it in the open-source simulator SVE-Cachesim to swap out the level one data cache (L1D) within a memory hierarchy. This proof of concept demonstrates that our technique can train simple models to match real program behavior in the L1D and can swap them in without destructive side-effects for the performance of downstream models. Our models introduce only 8% error in the overall cycle count, while being used for over 90% of the simulation and using models that require two to eight times less computation per cache access.
more » « less
Full Text Available
Evaluating Gather and Scatter Performance on CPUs and GPUs

Lavin, Patrick; Young, Jeffrey Young; Vuduc, Richard; Riedy, Jason; Vose, Aaron Vose; and Ernst, Daniel (January 2020, The International Symposium on Memory Systems (MEMSYS 2020))

This paper describes a new benchmark tool, Spatter, for assessing memory system architectures in the context of a specific category of indexed accesses known as gather and scatter. These types of operations are increasingly used to express sparse and irregular data access patterns, and they have widespread utility in many modern HPC applications including scientific simulations, data mining and analysis computations, and graph processing. However, many traditional benchmarking tools like STREAM, STRIDE, and GUPS focus on characterizing only uniform stride or fully random accesses despite evidence that modern applications use varied sets of more complex access patterns. Spatter is an open-source benchmark that provides a tunable and configurable framework to benchmark a variety of indexed access patterns, including variations of gather / scatter that are seen in HPC mini-apps evaluated in this work. The design of Spatter includes backends for OpenMP and CUDA, and experiments show how it can be used to evaluate 1) uniform access patterns for CPU and GPU, 2) prefetching regimes for gather / scatter, 3) compiler implementations of vectorization for gather / scatter, and 4) trace-driven "proxy patterns" that reflect the patterns found in multiple applications. The results from Spatter experiments show, for instance, that GPUs typically outperform CPUs for these operations in absolute bandwidth but not fraction of peak bandwidth, and that Spatter can better represent the performance of some cache-dependent mini-apps than traditional STREAM bandwidth measurements.
more » « less
Full Text Available
A microbenchmark characterization of the Emu chick

https://doi.org/10.1016/j.parco.2019.04.012

Young, Jeffrey S.; Hein, Eric; Eswar, Srinivas; Lavin, Patrick; Li, Jiajia; Riedy, Jason; Vuduc, Richard; Conte, Tom (September 2019, Parallel Computing)

Full Text Available
An Initial Characterization of the Emu Chick

Hein, Eric; Conte, Tom; Young, Jeffrey S.; Eswar, Srinivas; Li, Jiajia; Lavin, Patrick; Vuduc, Richard; Riedy, Jason (January 2018, IEEE International Parallel and Distributed Processing Symposium Workshops)

The Emu Chick is a prototype system designed around the concept of migratory memory-side processing. Rather than transferring large amounts of data across power-hungry, high-latency interconnects, the Emu Chick moves lightweight thread contexts to near-memory cores before the beginning of each memory read. The current prototype hardware uses FPGAs to implement cache-less “Gossamer” cores for doing computational work and a stationary core to run basic operating system functions and migrate threads between nodes. In this initial characterization of the Emu Chick, we study the memory bandwidth characteristics of the system through benchmarks like STREAM, pointer chasing, and sparse matrix vector multiply. We compare the Emu Chick hardware to architectural simulation and Intel Xeon-based platforms. While it is difficult to accurately compare prototype hardware with existing systems, our initial evaluation demonstrates that the Emu Chick uses available memory bandwidth more efficiently than a more traditional, cache-based architecture. Moreover, the Emu Chick provides stable, predictable performance with 80% bandwidth utilization on a random-access pointer chasing benchmark with weak locality.
more » « less
Full Text Available

Search for: All records